Skip to content

Conversation

@arhik
Copy link
Contributor

@arhik arhik commented Jan 18, 2026

This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.

Key changes:

  • Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode
  • Added encode_scan_identity_array! to reuse existing identity encoding
  • Added scan intrinsic implementation using operation_identity from IntegerReduce
  • Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion
  • Added comprehensive codegen tests for scan operations
    - Added scankernel.jl example demonstrating two pass chained scan algorithm

Features:

  • Supports cumulative sum (cumsum) for float and integer types
  • Supports both forward and reverse scan directions
  • Reuses FloatIdentityVal and IntegerIdentityVal from IntegerReduce
  • Uses operation_identity function for cleaner identity value creation
    - 1-indexed axis parameter (consistent with reduce operations)
  • Preserves tile shape (scan is an element-wise operation along one dimension)

Tests:

  • All 142 codegen tests pass (including 6 new scan tests)

@arhik arhik marked this pull request as ready for review January 18, 2026 05:29
@arhik
Copy link
Contributor Author

arhik commented Jan 18, 2026

This depends on #37 PR.

This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.

Key changes:
- Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode
- Added encode_scan_identity_array! to reuse existing identity encoding
- Added scan intrinsic implementation using operation_identity from IntegerReduce
- Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion
- Added comprehensive codegen tests for scan operations
- Added scankernel.jl example demonstrating CSDL scan algorithm

Features:
- Supports cumulative sum (cumsum) for float and integer types
- Supports both forward and reverse scan directions
- Reuses FloatIdentityOp and IntegerIdentityOp from IntegerReduce
- Uses operation_identity function for cleaner identity value creation
- 1-indexed axis parameter (consistent with reduce operations)
- Preserves tile shape (scan is an element-wise operation along one dimension)

Tests:
- All 142 codegen tests pass (including 6 new scan tests)
- Scankernel.jl example runs successfully with CSDL algorithm

- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)

Minor comment improvements in scankernel.jl example

- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)
@arhik
Copy link
Contributor Author

arhik commented Jan 30, 2026

This will fail. Still uses IdentityOp.

@arhik
Copy link
Contributor Author

arhik commented Jan 30, 2026

@maleadt Thanks.

@maleadt
Copy link
Member

maleadt commented Jan 30, 2026

I moved the example into the test suite, if you don't mind. I'd rather keep the examples for high-level examples and/or ports of cutile Python examples.

@maleadt maleadt merged commit 0ee0155 into JuliaGPU:main Jan 30, 2026
8 checks passed
@arhik
Copy link
Contributor Author

arhik commented Jan 30, 2026

I moved the example into the test suite, if you don't mind. I'd rather keep the examples for high-level examples and/or ports of cutile Python examples.

Completely agree. I updated comments too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants